Combining Feature and Model-Based Adaptation of RNNLMs for Multi-Genre Broadcast Speech Recognition
نویسندگان
چکیده
Recurrent neural network language models (RNNLMs) have consistently outperformed n-gram language models when used in automatic speech recognition (ASR). This is because RNNLMs provide robust parameter estimation through the use of a continuous-space representation of words, and can generally model longer context dependencies than n-grams. The adaptation of RNNLMs to new domains remains an active research area and the two main approaches are: feature-based adaptation, where the input to the RNNLM is augmented with auxiliary features; and model-based adaptation, which includes model fine-tuning and introduction of adaptation layer(s) in the network. This paper explores the properties of both types of adaptation on multi-genre broadcast speech recognition. Two hybrid adaptation techniques are proposed, namely the finetuning of feature-based RNNLMs and the use of a feature-based adaptation layer. A method for the semi-supervised adaptation of RNNLMs, using topic model-based genre classification, is also presented and investigated. The gains obtained with RNNLM adaptation on a system trained on 700h. of speech are consistent using both RNNLMs trained on a small (10M words) and large set (660M words), with 10% perplexity and 2% word error rate improvements on a 28.3h. test set.
منابع مشابه
Recurrent neural network language model adaptation for multi-genre broadcast speech recognition
Recurrent neural network language models (RNNLMs) have recently become increasingly popular for many applications including speech recognition. In previous research RNNLMs have normally been trained on well-matched in-domain data. The adaptation of RNNLMs remains an open research area to be explored. In this paper, genre and topic based RNNLM adaptation techniques are investigated for a multi-g...
متن کاملUnsupervised Adaptation of Recurrent Neural Network Language Models
Recurrent neural network language models (RNNLMs) have been shown to consistently improve Word Error Rates (WERs) of large vocabulary speech recognition systems employing ngram LMs. In this paper we investigate supervised and unsupervised discriminative adaptation of RNNLMs in a broadcast transcription task to target domains defined by either genre or show. We have explored two approaches based...
متن کاملSemi-Supervised Adaptation of RNNLMs by Fine-Tuning with Domain-Specific Auxiliary Features
Recurrent neural network language models (RNNLMs) can be augmented with auxiliary features, which can provide an extra modality on top of the words. It has been found that RNNLMs perform best when trained on a large corpus of generic text and then fine-tuned on text corresponding to the sub-domain for which it is to be applied. However, in many cases the auxiliary features are available for the...
متن کاملMultifactor adaptation for Mandarin broadcast news and conversation speech recognition
We explore the integration of multiple factors such as genre and speaker gender for acoustic model adaptation tasks to improve Mandarin ASR system performance on broadcast news and broadcast conversation audio. We investigate the use of multifactor clustering of acoustic model training data and the application of MPE-MAP and fMPE-MAP acoustic model adaptations. We found that by effectively comb...
متن کاملInvestigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition
Recurrent neural network language models (RNNLMs) are powerful language modeling techniques. Significant performance improvements have been reported in a range of tasks including speech recognition compared to n-gram language models. Conventional n-gram and neural network language models are trained to predict the probability of the next word given its preceding context history. In contrast, bi...
متن کامل